作者:Jiacheng Wei Hao Wang Jiashi Feng Guosheng Lin Kim-Hui Yap
在本文中,我们研究了一项开放的研究任务,即根据给定的文本描述生成可控的3D纹理形状。以前的作品要么需要地面实况字幕标记,要么需要大量的优化时间。为了解决这些问题,我们提出了一个新的框架TAPS3D,用于训练具有伪字幕的文本引导三维形状生成器。具体来说,基于渲染的2D图像,我们从CLIP词汇表中检索相关单词,并使用模板构建伪字幕。我们构建的字幕为生成的3D形状提供了高级语义监督。此外,为了产生细粒度纹理并增加几何多样性,我们建议采用低级别的图像正则化,使伪渲染图像与真实渲染图像对齐。在推理阶段,我们提出的模型可以从给定的文本中生成3D纹理形状,而无需任何额外的优化。我们进行了大量的实验来分析我们提出的每个组件和sho
In this paper, we investigate an open research task of generating controllable 3D textured shapes from the given textual descriptions. Previous works either require ground truth caption labeling or extensive optimization time. To resolve these issues, we present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions. Specifically, based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates. Our constructed captions provide high-level semantic supervision for generated 3D shapes. Further, in order to produce fine-grained textures and increase geometry diversity, we propose to adopt low-level image regularization to enable fake-rendered images to align with the real ones. During the inference phase, our proposed model can generate 3D textured shapes from the given text without any additional optimization. We conduct extensive experiments to analyze each of our proposed components and show the efficacy of our framework in generating high-fidelity 3D textured and text-relevant shapes.
论文链接:http://arxiv.org/pdf/2303.13273v1
更多计算机论文:http://cspaper.cn/